A Commuter’s Challenge in Suburban Philadelphia

Every morning, Midge, a healthcare worker, embarks on her daily journey from the quiet streets of Pottstown1 in Montgomery County to the bustling corridors of the Hospital of the University of Pennsylvania in Philadelphia. Her commute is more than just a physical journey—it’s a daily ordeal, wrestling with the infrequent and unreliable bus service that connects the suburban sprawl to the urban core. On a good day, Midge waits 30 minutes for Bus 93; on a bad day, the wait can exceed an hour. She must then transfer to the Norristown High Speed Line, extending her total morning commute to over 2.5 hours—five times longer than Philadelphia’s average commute time.2 This inconsistency isn’t just a personal inconvenience; it’s a stark reflection of the broader issues of transit inequities that plague many in the suburbs, where buses are few and far between despite the pressing need.

Midge’s story is not unique in Greater Philadelphia—a region where 58.2 percent of public transit users are women and 66.4 percent are people of color. These individuals predominantly populate essential sectors like healthcare, education, and social assistance which employs 25.6 percent of suburban county workers who depend on public transportation. Despite playing critical roles in their communities, these workers earn median annual earnings approximately $7,673 below the regional average. This economic disparity is exacerbated by the transit challenges they face, where the gap in accessibility and frequency of service widens the divide between different socioeconomic groups. Additionally, a significant portion of the workforce, about 35%, lives below the poverty line, heavily relying on this patchwork public transport system that struggles to meet their daily needs.3

The Importance of Transit for Suburban Communities

Midge’s daily commute sheds light on a more significant issue: the critical role of public transportation in providing access to essential services and economic opportunities, especially in suburban communities. Studies highlight a robust negative correlation between unemployment rates and proximity to transit access. In New York City, neighborhoods with some but limited transit access experienced a significantly higher unemployment rate (12.6%) compared to areas with robust transit options (8.1%). 4 This lack of transit access limits physical and economic mobility, contributing to income inequality and higher public assistance dependency.

Moreover, reliable public transportation is pivotal for healthcare access. The National Health Interview Survey (2002) estimated that 3.6 million people miss non-emergency medical appointments annually due to transportation issues. This issue disproportionately affects communities of color, who rely more heavily on public transportation compared to white populations. 5 Midge’s commute is not just a matter of convenience; it’s about accessing crucial healthcare services that manage chronic conditions and provide necessary treatments.

Transportation also plays a fundamental role in educational access. The “geography of opportunity” significantly shapes educational outcomes through accessible transportation, enabling children and young people to reach schools and participate in extracurricular activities.6 A study in Minneapolis found that high school students with access to an unlimited bus and rail pass attended school more regularly and achieved higher GPAs by 0.23 than students without a pass, 7 highlighting the importance of reliable transit in educational attainment.

Using the latest SEPTA General Transit Feed Specification (GTFS) data8, it reveals that a staggering 80% of SEPTA’s trips are made by bus, showcasing the critical role of bus transit in Philadelphia’s transportation network. This statistic underlines the heavy reliance on buses across the city, not just in suburban areas.

# Import GTFS data
stops <- read.csv("data/google_bus/stops.txt", header = TRUE, sep = ",")
stop_times <- read.csv("data/google_bus/stop_times.txt", header = TRUE, sep = ",")
trips <- read.csv("data/google_bus/trips.txt", header = TRUE, sep = ",")
shapes <- read.csv("data/google_bus/shapes.txt", header = TRUE, sep = ",")
routes <- read.csv("data/google_bus/routes.txt", header = TRUE, sep = ",")
calendar <- read.csv("data/google_bus/calendar.txt", header = TRUE, sep = ",")
calendar_dates <- read.csv("data/google_bus/calendar_dates.txt", header = TRUE, sep = ",")

# Import Swiftly data
swiftly <- st_read("data/SwiftlyShapes/geoJSONs/SEPTASurface_HighResSpeed_v2.json")
# Join stop_times and trips, from now on dat will be our main data set
dat <- left_join(stop_times, trips, by = "trip_id")

# join dat and routes
dat <- left_join(dat, routes, by = "route_id")

#determine route_type
dat <- dat %>%
  mutate(mode = if_else(route_type == 0, "Trolley", ifelse(route_type == 1, "Metro", ifelse(route_type == 3, "Bus", "Trolleybus"))))

# Mode share (not ridership)
dat_trip_by_mode <- dat %>%
  group_by(mode) %>%
  summarise(mode_count = n(), .groups = 'drop') %>%
  mutate(total_count = sum(mode_count),
         mode_share = round(mode_count/sum(mode_count)*100, digits = 2)) %>%
  ungroup() 
  
# # Plot mode share using ggplot2 and the custom palette
ggplot(dat_trip_by_mode, aes(x = mode, y = mode_share, fill = mode)) +
  geom_bar(stat = "identity", width = 0.75) +
  scale_fill_manual(values = bus_revolution_palette(length(unique(dat_trip_by_mode$mode)))) +
  geom_text(aes(label = paste0(mode_share, "% (", mode_count, " trips)")),
            vjust = -0.5, color = "black", size = 2.5) +
  labs(title = "Mode Share by Transit Type",
       x = "Mode",
       y = "Mode Share (%)",
       fill = "Mode",
       caption = "Source: SEPTA GTFS Data 2024") +
  theme_minimal() +
  theme(legend.position = "right",
        plot.title = element_text(hjust = 0.5))

# 
# # total trip bus & trolley bus 1.980.505, total observations 2.026.677
# 
# ggplot() +
#   # Add service period background rectangles with reduced alpha for subtlety
#   geom_rect(data = service_periods, aes(xmin = start_hour, xmax = end_hour, ymin = 0, ymax = max(hourly_trips$number_of_trips), fill = peak_category), alpha = 0.2) +
#   # Add the line plot for weekday trips with adjusted point sizes
#   geom_line(data = hourly_trips %>% filter(final_day_type == "Weekday"), aes(x = hour, y = number_of_trips), color = "#F2AE2E", size = 1) +
#   geom_point(data = hourly_trips %>% filter(final_day_type == "Weekday"), aes(x = hour, y = number_of_trips), color = "#26A699", size = 3) +
#   # Add the line plot for weekend trips with a new color for better differentiation
#   geom_line(data = hourly_trips %>% filter(final_day_type == "Weekend"), aes(x = hour, y = number_of_trips), color = "#0072B2", size = 1) +
#   geom_point(data = hourly_trips %>% filter(final_day_type == "Weekend"), aes(x = hour, y = number_of_trips), color = "#D55E00", size = 3) +
#   # Define scales and labels with appropriate adjustments
#   scale_x_continuous(breaks = 0:23) +
#   scale_y_continuous(labels = scales::comma) +
#   scale_fill_manual(values = service_period_colors) +
#   scale_color_manual(values = c("Weekday" = "#F2AE2E", "Weekend" = "#0072B2")) +
#   labs(title = "Number of Trips by Hour with Service Periods",
#        x = "Hour of Day",
#        y = "Number of Trips",
#        fill = "Service Period",
#        color = "Day Type") +
#   theme_minimal() +
#   theme(legend.position = "bottom",
#         plot.title = element_text(hjust = 0.5))
# CREATE LINE GRAPH OF TRIPS BY SERVICE PERIODS

# Function to normalize times greater than 24:00:00. Important step: check the stop_times$arrival_time and departure_time values if it's greater than 24:00 
normalize_time <- function(time_str) {
  time_parts <- strsplit(time_str, ":")[[1]]
  hours <- as.numeric(time_parts[1])
  minutes <- as.numeric(time_parts[2])
  seconds <- as.numeric(time_parts[3])
  
  if (hours >= 24) {
    hours <- hours - 24
    time_str <- sprintf("%02d:%02d:%02d", hours, minutes, seconds)
  }
  
  return(time_str)
}

#identify peak_pm category 
dat <- dat %>%
  mutate(departure_time = sapply(departure_time, normalize_time), # departure_time's normalized
         departure_time = lubridate::hms(departure_time),
         hour = hour(departure_time),  # Extract hour component for easier condition checking
         peak_category = if_else(hour >= 4 & hour < 6, "Early Morning",
                                    if_else(hour >= 6 & hour < 9, "AM Peak",
                                            if_else(hour >= 9 & hour < 15, "Mid Day",
                                                    if_else(hour >= 15 & hour < 18, "PM Peak",
                                                            if_else(hour >= 18 & hour < 22, "Evening", 
                                                                    ifelse(hour >= 22 & hour < 24, "Late Night", "Owl"))))))) 

# # Check if there are any NA values in the 'departure_time' column
# summarise(dat,
#     na_count = sum(is.na(departure_time)), #Count NA values
#     total_rows = n(),  # Total number of rows for context
#     percentage_na = na_count / total_rows * 100  # Percentage of NA values
#   )

durations <- c("Owl" = 4, "Early Morning" = 2, "AM Peak" = 3, "Mid Day" = 6, "PM Peak" = 3, "Evening" = 4, "Late Night" = 2)

dat_service_period <- dat %>%
  filter(mode %in% c("Bus", "Trolleybus", "Trolley")) %>%
  group_by(peak_category)%>%
  summarise(peak_count = n(), .groups = 'drop') %>%
  mutate(total_count = sum(peak_count),
         peak_share = round(peak_count/sum(peak_count)*100, digits = 2)) %>%
  arrange(factor(peak_category, levels = c("Owl", "Early Morning", "AM Peak", "Mid Day", "PM Peak", "Evening", "Late Night"))) %>%
  ungroup()

# Create a data frame for service periods
service_periods <- data.frame(
  peak_category = c("Owl", "Early Morning", "AM Peak", "Mid Day", "PM Peak", "Evening", "Late Night"),
  start_hour = c(0, 4, 6, 9, 15, 18, 22),
  end_hour = c(4, 6, 9, 15, 18, 22, 24)
)

# CREATE BAR CHART OF TRIPS BY SERVICE TYPES

# Ensure 'calendar' has 'day_type' defined
calendar <- calendar %>%
  mutate(
    day_type = case_when(
      monday == 1 & tuesday == 1 & wednesday == 1 & thursday == 1 & friday == 1 & saturday == 0 & sunday == 0 ~ "Weekday",
      (saturday == 1 | sunday == 1) ~ "Weekend",
      TRUE ~ "irregular"
    )
  )

# Calendar_dates includes 'service_id', 'date', and 'exception_type'
# Ensure 'calendar_dates' has 'day_type' defined based on the date
calendar_dates <- calendar_dates %>%
  filter(exception_type == 1) %>% #we only need exception_type == 1, those added to the service
  mutate(
    date = ymd(date),  # Convert date format if necessary
    day_of_week = wday(date, label = TRUE, week_start = 1),
    day_type = case_when(
      day_of_week %in% c("Mon", "Tue", "Wed", "Thu", "Fri") ~ "Weekday",  # Monday to Friday
      TRUE ~ "Weekend"  # Saturday and Sunday
    )
  )

# Join calendar_dates with calendar
calendar_joined <- calendar_dates %>%
  full_join(calendar, by = "service_id", suffix = c(".dates", ".cal")) %>%
  mutate(
    final_day_type = case_when(
      # Use calendar_dates' day_type if exception_type indicates added service
      exception_type == 1 & day_type.dates == "Weekday" ~ "Weekday",
      exception_type == 1 & day_type.dates == "Weekend" ~ "Weekend",
      # Default to calendar's day_type where there's no specific exception noted
      TRUE ~ day_type.cal
    )
  ) %>%
  select(service_id, final_day_type) %>%
  group_by(service_id, final_day_type) %>%
  summarise(count = n(), .groups = 'drop') %>%
  arrange(service_id, desc(count)) %>%
  distinct(service_id, .keep_all = TRUE) %>% # Make sure no double day types for service ID
  select(-count)
  
dat <- dat %>%
  left_join(calendar_joined, by = "service_id")
# SET UP STOPS CATEGORY. This step is crucial as we'll use stops information and join them for each kind of maps 

# Convert stops data to sf object
stops <- stops %>%
  st_as_sf(coords = c("stop_lon", "stop_lat"), crs = 4326, remove = FALSE)

# Load PA and SEPTA region map for base map
PA <- st_read("data/PaCounty2024_05.geojson")
SEPTA_region_map <- PA[PA$COUNTY_NAM %in% c("MONTGOMERY", "BUCKS", "PHILADELPHIA", "DELAWARE", "CHESTER"), ] %>%
  select(PA_CTY_COD, COUNTY_NAM, FIPS_COUNT)

# Load Center City boundary
center_city <- st_read("data/20240708_CenterCity/CenterCity.shp")

# Ensure all datasets are in the same CRS
SEPTA_region_map <- st_transform(SEPTA_region_map, crs = st_crs(stops))
center_city <- st_transform(center_city, crs = st_crs(stops))

# Perform the spatial intersection for county
stops <- stops %>%
  st_join(SEPTA_region_map, left = TRUE, join = st_intersects) %>%
  rename(county = COUNTY_NAM) %>%
  mutate(center_city = st_within(stops, center_city, sparse = FALSE))%>%
  mutate(center_city = rowSums(center_city) > 0)  # Convert logical matrix to vector

# Categorize the stops
stops <- stops %>%
  mutate(category = case_when(
    county == "PHILADELPHIA" & center_city == TRUE ~ "PHILADELPHIA-Center_City",
    county == "PHILADELPHIA" & center_city == FALSE ~ "PHILADELPHIA-Not_Center_City",
    TRUE ~ county
  )) %>%
  select(-"center_city")
# SET UP HOURLY AVERAGE TRIP DATA FOR EACH BUS STOP

dat_trip_count_hr <- dat %>%
  filter(mode %in% c("Bus", "Trolleybus", "Trolley")) %>%
  group_by(stop_id, mode, final_day_type) %>%
  summarise(trip_count = n(), 
            All_Routes = paste(unique(route_id), collapse = ", "),
            .groups = 'drop') %>%
  mutate(avg_trip_hr = round(trip_count/24)) %>%
  mutate(log_avg_trip_hr = round(log1p(avg_trip_hr), digits = 0)) %>%
  mutate(max_freq = ifelse(avg_trip_hr >= 12, "5 MAX",
                           ifelse (avg_trip_hr >= 6, "10 MAX",
                           ifelse (avg_trip_hr >= 4, "15 MAX", 
                           ifelse(avg_trip_hr >= 2, "30 MAX", "60 MAX")))))

dat_trip_count_hr <- left_join(dat_trip_count_hr, stops, by = "stop_id")

dat_trip_count_hr <- st_as_sf(dat_trip_count_hr, coords = c("stop_lon", "stop_lat"), crs = 4326, remove = FALSE)

# Assuming 'county' is the column with the county information
dat_trip_count_hr <- dat_trip_count_hr %>%
  mutate(category = if_else(county == "PHILADELPHIA", "urban", "suburban"))

# Simplify the max frequency into two categories
dat_trip_count_hr <- dat_trip_count_hr %>%
  mutate(simplified_freq = case_when(
    max_freq %in% c("5 MAX", "10 MAX", "15 MAX") ~ "15 minutes or less",
    TRUE ~ "More than 15 minutes"
  ))

# Remove rows where 'category' is NA
dat_trip_count_hr <- dat_trip_count_hr %>% filter(!is.na(category))

# sum(dat_trip_count_hr$trip_count) #1980505 <- Always make sure you retain the number of observation

Unraveling the Disparities in Bus Service Efficiency

The disparities in bus service efficiency between urban and suburban areas are starkly highlighted by frequency data. In suburban areas, the majority of bus stops suffer from infrequent services. This contrasts sharply with urban areas, where bus stops generally provide more frequent services. These differences underline significant inequities in transit accessibility, impacting daily commuters like Midge who rely heavily on these services.

In the following leaflet map, we illustrate these disparities. You’ll see that bus stops in suburban areas predominantly have longer waiting times, exceeding 15 minutes, whereas urban stops typically has waiting time under 15 minutes, demonstrating more frequent services. This visual representation emphasizes the urgent need for targeted improvements in suburban transit systems to bridge the gap and enhance overall service reliability and efficiency.

# SET UP AVERAGE TRIP DATA PER SERVICE PERIOD FOR EACH BUS STOP

dat_trip_count_prd <- dat %>%
  filter(mode %in% c("Bus", "Trolleybus", "Trolley")) %>%
  group_by(stop_id, peak_category, mode, final_day_type) %>%
  summarise(trip_count = n(), 
            All_Routes = paste(unique(route_id), collapse = ", "),
            .groups = 'drop') %>%
  mutate(
    hours = durations[peak_category],
    avg_trip_hr = round(trip_count / hours)
  ) %>%
  select(-hours) %>%
  mutate(log_avg_trip_hr = round(log1p(avg_trip_hr), digits = 0)) %>%
  mutate(max_freq = ifelse(avg_trip_hr >= 12, "5 MAX",
                           ifelse (avg_trip_hr >= 6, "10 MAX",
                           ifelse (avg_trip_hr >= 4, "15 MAX", 
                           ifelse(avg_trip_hr >= 2, "30 MAX", "60 MAX")))))

dat_trip_count_prd <- left_join(dat_trip_count_prd, stops, by = "stop_id")

dat_trip_count_prd <- st_as_sf(dat_trip_count_prd, coords = c("stop_lon", "stop_lat"), crs = 4326, remove = FALSE)

# sum(dat_trip_count_prd$trip_count)  #1980505 <- Always make sure you retain the number of observation

SEPTA_region_map <- st_transform(SEPTA_region_map, st_crs(dat_trip_count_hr))
# Define color palette for simplified frequency using the bus revolution colors
pal <- colorFactor(c("#A6123A", "#655BA6"), domain = c("15 minutes or less", "More than 15 minutes"))

# Create the leaflet map
m_avg_trip_hr <- leaflet() %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data = SEPTA_region_map, color = "black", weight = 1, fill = FALSE) %>%
  addCircleMarkers(
    data = dat_trip_count_hr %>% filter(final_day_type == "Weekday" & !is.na(category)),
    group = ~category,  # Group data based on urban or suburban
    lng = ~stop_lon, 
    lat = ~stop_lat,
    color = ~pal(simplified_freq),  # Color based on simplified frequency
    radius = 1,
    opacity = 0.8,
    popup = ~paste0("<b>Stop ID:</b> ", stop_id, "<br>",
                    "<b>Stop Name:</b> ", stop_name, "<br>",
                    "<b>Avg Buses/Hr:</b> ", avg_trip_hr, "<br>",
                    "<b>Max Frequency:</b> ", simplified_freq, "<br>",
                    "<b>All Routes:</b> ", All_Routes)
  ) %>%
  addLayersControl(
    overlayGroups = c("urban", "suburban"),
    options = layersControlOptions(collapsed = FALSE)
  ) %>%
  addControl(html = "<div style='background-color: white; padding: 5px;'><strong>Source:</strong> SEPTA GTFS Data 2024</div>", position = "bottomleft")

# Display the map
m_avg_trip_hr

Midge’s daily challenge with inconsistent suburban bus service reflects broader transit inequities between urban and suburban settings. This disparity is exemplified by the operational inefficiencies of suburban routes, highlighted by data indicating these buses frequently fall within the lowest 15th percentile for service performance. This low percentile indicates prolonged wait times and unreliable schedules—factors contributing to higher operational costs and reduced reliability compared to more efficient urban counterparts.

Suburban buses, like the routes Midge uses, often suffer due to lower ridership densities and less optimized route planning, as reflected in the significant number of suburban routes like Routes 118, 129, and 90 underperforming in the 15th percentile. These routes’ inefficiencies starkly contrast with the robust transit systems in urban areas, underscoring a critical gap in transit equity and efficiency that demands innovative solutions.

route_evaluation <- read.csv("data/route_evaluation_report_2019.csv", sep = ";")

# Convert Cost.per.Passenger and Passengers.per.Revenue.Hour into proper numeric columns
route_evaluation$Cost.per.Passenger <- as.numeric(gsub("[$,]", "", route_evaluation$Cost.per.Passenger))
route_evaluation$Passengers.per.Revenue.Hour <- as.numeric(route_evaluation$Passengers.per.Revenue.Hour)

# Calculate 15th percentile for Passengers and 85th percentile for Cost (due to reversed axis)
passenger_15th <- quantile(route_evaluation$Passengers.per.Revenue.Hour, probs = 0.15, na.rm = TRUE)
cost_85th <- quantile(route_evaluation$Cost.per.Passenger, probs = 0.85, na.rm = TRUE)

# Define color palette for the types of buses
route_colors <- c("City Bus" = "#A6123A", "Suburban Bus" = "#655BA6")
pal <- colorFactor(route_colors, domain = unique(route_evaluation$Type))

# Create the ggplot
p <- ggplot(route_evaluation, aes(x = Passengers.per.Revenue.Hour, y = Cost.per.Passenger, color = Type, text = paste("Route:", Route, "<br>Passengers per Revenue Hour:", Passengers.per.Revenue.Hour, "<br>Cost per Passenger:", Cost.per.Passenger, "<br>Type:", Type))) +
  annotate("rect", xmin = 0, xmax = passenger_15th, ymin = max(route_evaluation$Cost.per.Passenger), ymax = cost_85th, fill = "lightcoral", alpha = 0.2) +
  geom_vline(xintercept = passenger_15th, linetype = "dashed", color = "grey") +
  geom_hline(yintercept = cost_85th, linetype = "dashed", color = "grey") +
  geom_point() +
  labs(title = "Performance of SEPTA City Bus and Suburban Bus",
       x = "Passengers per Revenue Hour",
       y = "Cost per Passenger",
       caption = "Source: SEPTA FY 2019 Report") +
  scale_color_manual(values = route_colors) +
  scale_y_reverse(breaks = seq(0, max(route_evaluation$Cost.per.Passenger, na.rm = TRUE), by = 1),
                  labels = dollar_format(prefix = "$", suffix = "", accuracy = 0.01)) +
  scale_x_continuous(breaks = seq(0, max(route_evaluation$Passengers.per.Revenue.Hour, na.rm = TRUE), by = 10)) +
  theme_minimal() +
  theme(legend.position = "none") +
  annotate("text", x = passenger_15th / 2, y = 6.5, label = "15th percentile", size = 3, color = "black", alpha = 0.5)

# Convert to plotly for an interactive plot with detailed hover information
p_plotly <- ggplotly(p, tooltip = "text")

# Display the interactive plot
p_plotly

Although microtransit has been proposed as an alternative to address these inefficiencies by offering more adaptable and potentially efficient services in lower-density areas, its effectiveness remains questionable. For example, AC Transit’s experiment in Oakland to replace fixed routes with microtransit resulted in a cost per passenger that more than doubled, highlighting the challenges of implementing such services effectively.9

This data illustrates that while microtransit is often touted as a modern solution for declining ridership, it generally does not perform as efficiently as traditional fixed-route services. The core efficiency of fixed-route transit comes from its ability to consolidate riders along a predictable path, minimizing deviations and ensuring consistent travel times. In contrast, microtransit’s on-demand nature can lead to significant inefficiencies, with routes constantly changing to accommodate individual pickup requests, leading to increased travel times and operational costs.

Given these challenges, it might be more prudent to explore other alternatives, such as deploying smaller, fixed-route minibusses during peak hours to enhance connectivity and manage crowding more effectively. This approach could leverage the lower operational costs of smaller vehicles while maintaining the structured efficiency of fixed-route transit, potentially offering a more sustainable solution to the suburban transit woes that Midge and countless others face daily.

Impact and Innovation: Enhancing Suburban Bus Services

While the raw data illuminates the service disparities, the true depth of this issue is best captured through the lived experiences of suburban commuters like Midge. Personal stories reveal the broader human impact: missed opportunities, prolonged workdays, and diminished family life due to infrequent bus services. This qualitative insight paints a vivid picture of the daily challenges and underscores the urgency for targeted improvements.

SEPTA’s ongoing “Bus Revolution” project aims to modernize and enhance bus services, promising updated routes and improved frequencies that haven’t been revised since the 1960s. While the initiative seeks to reduce headways to a maximum of 30 minutes even in the most remote suburban areas, it has faced delays and criticism, particularly concerning its potential to disproportionately impact low-income neighborhoods. This tension highlights the complex interplay between improving service efficiency and addressing community concerns of gentrification.

As we conclude our exploration of suburban bus transit disparities in Philadelphia, the compelling narratives of daily commuters, underpinned by robust SEPTA data, paint a clear picture of the challenges and the urgent need for systemic improvements. The “Bus Revolution” project, while a promising step toward enhancing bus frequencies and updating decades-old routes, also highlights the delicate balance required to improve service efficiency without exacerbating social inequities.

Embracing innovative solutions and fostering community engagement in transit planning are crucial. By committing to both technological upgrades and equitable service distribution, SEPTA can ensure that every commuter, regardless of where they live, has reliable and timely access to public transit. This not only improves individual daily experiences but also supports broader goals of economic mobility and sustainable urban development.

As stakeholders, we are called upon to support these initiatives, advocate for transparent and inclusive planning processes, and ensure that the voices of all commuters, especially those most affected like Midge, are heard and addressed in the pursuit of a truly connected and equitable transit system.

---
title: "Waiting for Change: The Urgent Need for Equitable Bus Services in Suburban Philadelphia"
author: "Regy Septian"
date: "2024-11-30"
output: 
  html_document:
    toc: true
    toc_float: true
    code_folding: "hide"
    code_download: true
---

```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, cache = FALSE)

library(tidyverse)
library(sf)
library(tidytransit)
library(lubridate)
library(hms)
library(tidyr)
library(patchwork)
library(kableExtra)
library(wk)
library(leaflet)
library(leaflet.extras)
library(viridis)
library(viridisLite)
library(htmlwidgets)
library(htmltools)
library(leaflet.extras)
library(forcats)
library(ggplot2)
library(scales)
library(shiny)
library(webshot2)
library(magick)
library(plotly)
library(readr)

# Define the color codes
bus_revolution_colors <- c("#A6123A", "#655BA6", "#26A699", "#F2AE2E", "#F2E5D5")

# Create a custom palette function
bus_revolution_palette <- function(n) {
  colorRampPalette(bus_revolution_colors)(n)
}

workdir <- "~/OneDrive - PennO365/03_UNIVERSITY_OF_PENNSYLVANIA-MCP_2025/04_FALL_24/03_MUSA_6951_Communication_in_Urban_Spatial_Analytics/MUSA-6310_Assignment-2/MUSA-6310_Assignment-2" # Change the directory
setwd(workdir)
```

### A Commuter's Challenge in Suburban Philadelphia  

Every morning, Midge, a healthcare worker, embarks on her daily journey from the quiet streets of Pottstown[^1] in Montgomery County to the bustling corridors of the Hospital of the University of Pennsylvania in Philadelphia. Her commute is more than just a physical journey—it’s a daily ordeal, wrestling with the infrequent and unreliable bus service that connects the suburban sprawl to the urban core. On a good day, Midge waits 30 minutes for Bus 93; on a bad day, the wait can exceed an hour. She must then transfer to the Norristown High Speed Line, extending her total morning commute to over 2.5 hours—five times longer than Philadelphia's average commute time.[^2] This inconsistency isn't just a personal inconvenience; it's a stark reflection of the broader issues of transit inequities that plague many in the suburbs, where buses are few and far between despite the pressing need.

Midge's story is not unique in Greater Philadelphia—a region where 58.2 percent of public transit users are women and 66.4 percent are people of color. These individuals predominantly populate essential sectors like healthcare, education, and social assistance which employs 25.6 percent of suburban county workers who depend on public transportation. Despite playing critical roles in their communities, these workers earn median annual earnings approximately $7,673 below the regional average. This economic disparity is exacerbated by the transit challenges they face, where the gap in accessibility and frequency of service widens the divide between different socioeconomic groups. Additionally, a significant portion of the workforce, about 35%, lives below the poverty line, heavily relying on this patchwork public transport system that struggles to meet their daily needs.[^3] 

[^1]: [WHYY: Tracing Montco’s decades-long shift from GOP stronghold to boon for Biden](https://whyy.org/articles/tracing-montcos-decades-long-shift-from-gop-stronghold-to-boon-for-biden/)
[^2]: [The Philadelphia Inquirer: Philly commutes are getting slightly quicker, but they’re among the longest in the U.S.](https://www.inquirer.com/transportation/philadelphia-commute-time-longest-transit-20241010.html#loaded)
[^3]: [Economy League: Greater Philadelphia’s Public Transportation and COVID-19 – PART 2](https://www.economyleague.org/resources/greater-philadelphias-public-transportation-and-covid-19-part-2#)

### The Importance of Transit for Suburban Communities  

Midge’s daily commute sheds light on a more significant issue: the critical role of public transportation in providing access to essential services and economic opportunities, especially in suburban communities. Studies highlight a robust negative correlation between unemployment rates and proximity to transit access. In New York City, neighborhoods with some but limited transit access experienced a significantly higher unemployment rate (12.6%) compared to areas with robust transit options (8.1%). [^4] This lack of transit access limits physical and economic mobility, contributing to income inequality and higher public assistance dependency.

Moreover, reliable public transportation is pivotal for healthcare access. The National Health Interview Survey (2002) estimated that 3.6 million people miss non-emergency medical appointments annually due to transportation issues. This issue disproportionately affects communities of color, who rely more heavily on public transportation compared to white populations. [^5] Midge’s commute is not just a matter of convenience; it’s about accessing crucial healthcare services that manage chronic conditions and provide necessary treatments.

Transportation also plays a fundamental role in educational access. The "geography of opportunity" significantly shapes educational outcomes through accessible transportation, enabling children and young people to reach schools and participate in extracurricular activities.[^6] A study in Minneapolis found that high school students with access to an unlimited bus and rail pass attended school more regularly and achieved higher GPAs by 0.23 than students without a pass, [^7] highlighting the importance of reliable transit in educational attainment.

Using the latest SEPTA General Transit Feed Specification (GTFS) data[^8], it reveals that a staggering 80% of SEPTA's trips are made by bus, showcasing the critical role of bus transit in Philadelphia’s transportation network. This statistic underlines the heavy reliance on buses across the city, not just in suburban areas. 

[^4]: [Kaufman and colleagues 2015](https://wagner.nyu.edu/files/faculty/publications/JobAccessNov2015.pdf)
[^5]: [Wallace et al. 2015](https://journals.sagepub.com/doi/abs/10.1177/0361198105192400110)
[^6]: [Vincent et al. 2014](https://eric.ed.gov/?id=ED558542)
[^7]: [Fan and Das 2015](https://conservancy.umn.edu/items/5d998968-aafe-4bf0-92d8-e5da5732f4f7)
[^8]: [SEPTA GTFS](https://www3.septa.org/developer/gtfs_public.zip)


```{r import gtfs, message=FALSE, warning=FALSE, cache=TRUE, results = 'hide'}
# Import GTFS data
stops <- read.csv("data/google_bus/stops.txt", header = TRUE, sep = ",")
stop_times <- read.csv("data/google_bus/stop_times.txt", header = TRUE, sep = ",")
trips <- read.csv("data/google_bus/trips.txt", header = TRUE, sep = ",")
shapes <- read.csv("data/google_bus/shapes.txt", header = TRUE, sep = ",")
routes <- read.csv("data/google_bus/routes.txt", header = TRUE, sep = ",")
calendar <- read.csv("data/google_bus/calendar.txt", header = TRUE, sep = ",")
calendar_dates <- read.csv("data/google_bus/calendar_dates.txt", header = TRUE, sep = ",")

# Import Swiftly data
swiftly <- st_read("data/SwiftlyShapes/geoJSONs/SEPTASurface_HighResSpeed_v2.json")

```

```{r total trip, message=FALSE, warning=FALSE, cache=TRUE}
# Join stop_times and trips, from now on dat will be our main data set
dat <- left_join(stop_times, trips, by = "trip_id")

# join dat and routes
dat <- left_join(dat, routes, by = "route_id")

#determine route_type
dat <- dat %>%
  mutate(mode = if_else(route_type == 0, "Trolley", ifelse(route_type == 1, "Metro", ifelse(route_type == 3, "Bus", "Trolleybus"))))

# Mode share (not ridership)
dat_trip_by_mode <- dat %>%
  group_by(mode) %>%
  summarise(mode_count = n(), .groups = 'drop') %>%
  mutate(total_count = sum(mode_count),
         mode_share = round(mode_count/sum(mode_count)*100, digits = 2)) %>%
  ungroup() 
  
# # Plot mode share using ggplot2 and the custom palette
ggplot(dat_trip_by_mode, aes(x = mode, y = mode_share, fill = mode)) +
  geom_bar(stat = "identity", width = 0.75) +
  scale_fill_manual(values = bus_revolution_palette(length(unique(dat_trip_by_mode$mode)))) +
  geom_text(aes(label = paste0(mode_share, "% (", mode_count, " trips)")),
            vjust = -0.5, color = "black", size = 2.5) +
  labs(title = "Mode Share by Transit Type",
       x = "Mode",
       y = "Mode Share (%)",
       fill = "Mode",
       caption = "Source: SEPTA GTFS Data 2024") +
  theme_minimal() +
  theme(legend.position = "right",
        plot.title = element_text(hjust = 0.5))
# 
# # total trip bus & trolley bus 1.980.505, total observations 2.026.677
# 
# ggplot() +
#   # Add service period background rectangles with reduced alpha for subtlety
#   geom_rect(data = service_periods, aes(xmin = start_hour, xmax = end_hour, ymin = 0, ymax = max(hourly_trips$number_of_trips), fill = peak_category), alpha = 0.2) +
#   # Add the line plot for weekday trips with adjusted point sizes
#   geom_line(data = hourly_trips %>% filter(final_day_type == "Weekday"), aes(x = hour, y = number_of_trips), color = "#F2AE2E", size = 1) +
#   geom_point(data = hourly_trips %>% filter(final_day_type == "Weekday"), aes(x = hour, y = number_of_trips), color = "#26A699", size = 3) +
#   # Add the line plot for weekend trips with a new color for better differentiation
#   geom_line(data = hourly_trips %>% filter(final_day_type == "Weekend"), aes(x = hour, y = number_of_trips), color = "#0072B2", size = 1) +
#   geom_point(data = hourly_trips %>% filter(final_day_type == "Weekend"), aes(x = hour, y = number_of_trips), color = "#D55E00", size = 3) +
#   # Define scales and labels with appropriate adjustments
#   scale_x_continuous(breaks = 0:23) +
#   scale_y_continuous(labels = scales::comma) +
#   scale_fill_manual(values = service_period_colors) +
#   scale_color_manual(values = c("Weekday" = "#F2AE2E", "Weekend" = "#0072B2")) +
#   labs(title = "Number of Trips by Hour with Service Periods",
#        x = "Hour of Day",
#        y = "Number of Trips",
#        fill = "Service Period",
#        color = "Day Type") +
#   theme_minimal() +
#   theme(legend.position = "bottom",
#         plot.title = element_text(hjust = 0.5))
```

```{r peak time, message=FALSE, warning=FALSE, cache=TRUE, results = 'hide'}
# CREATE LINE GRAPH OF TRIPS BY SERVICE PERIODS

# Function to normalize times greater than 24:00:00. Important step: check the stop_times$arrival_time and departure_time values if it's greater than 24:00 
normalize_time <- function(time_str) {
  time_parts <- strsplit(time_str, ":")[[1]]
  hours <- as.numeric(time_parts[1])
  minutes <- as.numeric(time_parts[2])
  seconds <- as.numeric(time_parts[3])
  
  if (hours >= 24) {
    hours <- hours - 24
    time_str <- sprintf("%02d:%02d:%02d", hours, minutes, seconds)
  }
  
  return(time_str)
}

#identify peak_pm category 
dat <- dat %>%
  mutate(departure_time = sapply(departure_time, normalize_time), # departure_time's normalized
         departure_time = lubridate::hms(departure_time),
         hour = hour(departure_time),  # Extract hour component for easier condition checking
         peak_category = if_else(hour >= 4 & hour < 6, "Early Morning",
                                    if_else(hour >= 6 & hour < 9, "AM Peak",
                                            if_else(hour >= 9 & hour < 15, "Mid Day",
                                                    if_else(hour >= 15 & hour < 18, "PM Peak",
                                                            if_else(hour >= 18 & hour < 22, "Evening", 
                                                                    ifelse(hour >= 22 & hour < 24, "Late Night", "Owl"))))))) 

# # Check if there are any NA values in the 'departure_time' column
# summarise(dat,
#     na_count = sum(is.na(departure_time)), #Count NA values
#     total_rows = n(),  # Total number of rows for context
#     percentage_na = na_count / total_rows * 100  # Percentage of NA values
#   )

durations <- c("Owl" = 4, "Early Morning" = 2, "AM Peak" = 3, "Mid Day" = 6, "PM Peak" = 3, "Evening" = 4, "Late Night" = 2)

dat_service_period <- dat %>%
  filter(mode %in% c("Bus", "Trolleybus", "Trolley")) %>%
  group_by(peak_category)%>%
  summarise(peak_count = n(), .groups = 'drop') %>%
  mutate(total_count = sum(peak_count),
         peak_share = round(peak_count/sum(peak_count)*100, digits = 2)) %>%
  arrange(factor(peak_category, levels = c("Owl", "Early Morning", "AM Peak", "Mid Day", "PM Peak", "Evening", "Late Night"))) %>%
  ungroup()

# Create a data frame for service periods
service_periods <- data.frame(
  peak_category = c("Owl", "Early Morning", "AM Peak", "Mid Day", "PM Peak", "Evening", "Late Night"),
  start_hour = c(0, 4, 6, 9, 15, 18, 22),
  end_hour = c(4, 6, 9, 15, 18, 22, 24)
)

# CREATE BAR CHART OF TRIPS BY SERVICE TYPES

# Ensure 'calendar' has 'day_type' defined
calendar <- calendar %>%
  mutate(
    day_type = case_when(
      monday == 1 & tuesday == 1 & wednesday == 1 & thursday == 1 & friday == 1 & saturday == 0 & sunday == 0 ~ "Weekday",
      (saturday == 1 | sunday == 1) ~ "Weekend",
      TRUE ~ "irregular"
    )
  )

# Calendar_dates includes 'service_id', 'date', and 'exception_type'
# Ensure 'calendar_dates' has 'day_type' defined based on the date
calendar_dates <- calendar_dates %>%
  filter(exception_type == 1) %>% #we only need exception_type == 1, those added to the service
  mutate(
    date = ymd(date),  # Convert date format if necessary
    day_of_week = wday(date, label = TRUE, week_start = 1),
    day_type = case_when(
      day_of_week %in% c("Mon", "Tue", "Wed", "Thu", "Fri") ~ "Weekday",  # Monday to Friday
      TRUE ~ "Weekend"  # Saturday and Sunday
    )
  )

# Join calendar_dates with calendar
calendar_joined <- calendar_dates %>%
  full_join(calendar, by = "service_id", suffix = c(".dates", ".cal")) %>%
  mutate(
    final_day_type = case_when(
      # Use calendar_dates' day_type if exception_type indicates added service
      exception_type == 1 & day_type.dates == "Weekday" ~ "Weekday",
      exception_type == 1 & day_type.dates == "Weekend" ~ "Weekend",
      # Default to calendar's day_type where there's no specific exception noted
      TRUE ~ day_type.cal
    )
  ) %>%
  select(service_id, final_day_type) %>%
  group_by(service_id, final_day_type) %>%
  summarise(count = n(), .groups = 'drop') %>%
  arrange(service_id, desc(count)) %>%
  distinct(service_id, .keep_all = TRUE) %>% # Make sure no double day types for service ID
  select(-count)
  
dat <- dat %>%
  left_join(calendar_joined, by = "service_id")

```

```{r calendar, eval=FALSE, include=FALSE}
dat_service_type <- dat %>%
  filter(mode %in% c("Bus", "Trolleybus", "Trolley")) %>%
  group_by(final_day_type)%>%
  summarise(day_type_count = n(), .groups = 'drop') %>%
  mutate(total_count = sum(day_type_count),
         peak_share = round(day_type_count/sum(day_type_count)*100, digits = 2)) %>%
  arrange(factor(final_day_type, levels = c("Weekday", "Weekend"))) %>%
  ungroup()

# Summarize data to get the number of trips per hour
hourly_trips <- dat %>%
  filter(mode %in% c("Bus", "Trolleybus", "Trolley")) %>%
  group_by(hour, final_day_type) %>%
  summarise(number_of_trips = n())

# Define a custom palette for the service periods
service_period_colors <- c(
  "Owl" = "#E5E5E5",
  "Early Morning" = "#C5E0B4",
  "AM Peak" = "#A9D08E",
  "Mid Day" = "#FFD966",
  "PM Peak" = "#F4B084",
  "Evening" = "#D5A6BD",
  "Late Night" = "#A4C2F4"
)

# # Plot the combined graph
# ggplot() +
#   # Add service period background rectangles
#   geom_rect(data = service_periods, aes(xmin = start_hour, xmax = end_hour, ymin = 0, ymax = max(hourly_trips$number_of_trips), fill = peak_category), alpha = 0.3) +
#   # Add the line plot for weekday trips
#   geom_line(data = hourly_trips %>% filter(final_day_type == "Weekday"), aes(x = hour, y = number_of_trips), color = "#F2AE2E", size = 1) +
#   geom_point(data = hourly_trips %>% filter(final_day_type == "Weekday"), aes(x = hour, y = number_of_trips), color = "#26A699", size = 2) +
#   # Add the line plot for weekend trips
#   geom_line(data = hourly_trips %>% filter(final_day_type == "Weekend"), aes(x = hour, y = number_of_trips), color = "#A6123A", size = 1) +
#   geom_point(data = hourly_trips %>% filter(final_day_type == "Weekend"), aes(x = hour, y = number_of_trips), color = "#26A699", size = 2) +
#   # Define scales and labels
#   scale_x_continuous(breaks = 0:23) +
#   scale_y_continuous(labels = comma) +
#   scale_fill_manual(values = service_period_colors) +
#   scale_color_manual(values = c("Weekday" = "#F2AE2E", 
#                                 "Weekend" = "#A6123A")) +
#   labs(title = "Number of Trips by Hour with Service Periods",
#        x = "Hour of Day",
#        y = "Number of Trips",
#        fill = "Service Period",
#        color = "Day Type") +
#   theme_minimal() +
#   theme(legend.position = "bottom",
#         plot.title = element_text(hjust = 0.5))
  
```


```{r set up stops category, message=FALSE, warning=FALSE, cache=TRUE, results = 'hide'}
# SET UP STOPS CATEGORY. This step is crucial as we'll use stops information and join them for each kind of maps 

# Convert stops data to sf object
stops <- stops %>%
  st_as_sf(coords = c("stop_lon", "stop_lat"), crs = 4326, remove = FALSE)

# Load PA and SEPTA region map for base map
PA <- st_read("data/PaCounty2024_05.geojson")
SEPTA_region_map <- PA[PA$COUNTY_NAM %in% c("MONTGOMERY", "BUCKS", "PHILADELPHIA", "DELAWARE", "CHESTER"), ] %>%
  select(PA_CTY_COD, COUNTY_NAM, FIPS_COUNT)

# Load Center City boundary
center_city <- st_read("data/20240708_CenterCity/CenterCity.shp")

# Ensure all datasets are in the same CRS
SEPTA_region_map <- st_transform(SEPTA_region_map, crs = st_crs(stops))
center_city <- st_transform(center_city, crs = st_crs(stops))

# Perform the spatial intersection for county
stops <- stops %>%
  st_join(SEPTA_region_map, left = TRUE, join = st_intersects) %>%
  rename(county = COUNTY_NAM) %>%
  mutate(center_city = st_within(stops, center_city, sparse = FALSE))%>%
  mutate(center_city = rowSums(center_city) > 0)  # Convert logical matrix to vector

# Categorize the stops
stops <- stops %>%
  mutate(category = case_when(
    county == "PHILADELPHIA" & center_city == TRUE ~ "PHILADELPHIA-Center_City",
    county == "PHILADELPHIA" & center_city == FALSE ~ "PHILADELPHIA-Not_Center_City",
    TRUE ~ county
  )) %>%
  select(-"center_city")
  
```

```{r hourly stop data, message=FALSE, warning=FALSE, cache=TRUE, results = 'hide'}
# SET UP HOURLY AVERAGE TRIP DATA FOR EACH BUS STOP

dat_trip_count_hr <- dat %>%
  filter(mode %in% c("Bus", "Trolleybus", "Trolley")) %>%
  group_by(stop_id, mode, final_day_type) %>%
  summarise(trip_count = n(), 
            All_Routes = paste(unique(route_id), collapse = ", "),
            .groups = 'drop') %>%
  mutate(avg_trip_hr = round(trip_count/24)) %>%
  mutate(log_avg_trip_hr = round(log1p(avg_trip_hr), digits = 0)) %>%
  mutate(max_freq = ifelse(avg_trip_hr >= 12, "5 MAX",
                           ifelse (avg_trip_hr >= 6, "10 MAX",
                           ifelse (avg_trip_hr >= 4, "15 MAX", 
                           ifelse(avg_trip_hr >= 2, "30 MAX", "60 MAX")))))

dat_trip_count_hr <- left_join(dat_trip_count_hr, stops, by = "stop_id")

dat_trip_count_hr <- st_as_sf(dat_trip_count_hr, coords = c("stop_lon", "stop_lat"), crs = 4326, remove = FALSE)

# Assuming 'county' is the column with the county information
dat_trip_count_hr <- dat_trip_count_hr %>%
  mutate(category = if_else(county == "PHILADELPHIA", "urban", "suburban"))

# Simplify the max frequency into two categories
dat_trip_count_hr <- dat_trip_count_hr %>%
  mutate(simplified_freq = case_when(
    max_freq %in% c("5 MAX", "10 MAX", "15 MAX") ~ "15 minutes or less",
    TRUE ~ "More than 15 minutes"
  ))

# Remove rows where 'category' is NA
dat_trip_count_hr <- dat_trip_count_hr %>% filter(!is.na(category))

# sum(dat_trip_count_hr$trip_count) #1980505 <- Always make sure you retain the number of observation

```

```{r eval=FALSE, include=FALSE}
ggplot(dat_trip_by_mode, aes(x = mode, y = mode_share, fill = mode)) +
  geom_bar(stat = "identity", width = 0.75) +
  scale_fill_manual(values = bus_revolution_palette(length(unique(dat_trip_by_mode$mode)))) +
  geom_text(aes(label = paste0(mode_share, "% (", mode_count, " trips)")),
            vjust = -0.5, color = "black", size = 2.5) +
  labs(title = "Mode Share by Transit Type",
       x = "Mode",
       y = "Mode Share (%)",
       fill = "Mode") +
  theme_minimal() +
  theme(legend.position = "right",
        plot.title = element_text(hjust = 0.5))
```


### Unraveling the Disparities in Bus Service Efficiency

The disparities in bus service efficiency between urban and suburban areas are starkly highlighted by frequency data. In suburban areas, the majority of bus stops suffer from infrequent services. This contrasts sharply with urban areas, where bus stops generally provide more frequent services. These differences underline significant inequities in transit accessibility, impacting daily commuters like Midge who rely heavily on these services.

In the following leaflet map, we illustrate these disparities. You'll see that bus stops in suburban areas predominantly have longer waiting times, <span style="color: #655BA6;">exceeding 15 minutes</span>, whereas urban stops typically has waiting time <span style="color: #A6123A;">under 15 minutes</span>, demonstrating more frequent services. This visual representation emphasizes the urgent need for targeted improvements in suburban transit systems to bridge the gap and enhance overall service reliability and efficiency.

```{r service period stop data, message=FALSE, warning=FALSE, cache=FALSE}
# SET UP AVERAGE TRIP DATA PER SERVICE PERIOD FOR EACH BUS STOP

dat_trip_count_prd <- dat %>%
  filter(mode %in% c("Bus", "Trolleybus", "Trolley")) %>%
  group_by(stop_id, peak_category, mode, final_day_type) %>%
  summarise(trip_count = n(), 
            All_Routes = paste(unique(route_id), collapse = ", "),
            .groups = 'drop') %>%
  mutate(
    hours = durations[peak_category],
    avg_trip_hr = round(trip_count / hours)
  ) %>%
  select(-hours) %>%
  mutate(log_avg_trip_hr = round(log1p(avg_trip_hr), digits = 0)) %>%
  mutate(max_freq = ifelse(avg_trip_hr >= 12, "5 MAX",
                           ifelse (avg_trip_hr >= 6, "10 MAX",
                           ifelse (avg_trip_hr >= 4, "15 MAX", 
                           ifelse(avg_trip_hr >= 2, "30 MAX", "60 MAX")))))

dat_trip_count_prd <- left_join(dat_trip_count_prd, stops, by = "stop_id")

dat_trip_count_prd <- st_as_sf(dat_trip_count_prd, coords = c("stop_lon", "stop_lat"), crs = 4326, remove = FALSE)

# sum(dat_trip_count_prd$trip_count)  #1980505 <- Always make sure you retain the number of observation

SEPTA_region_map <- st_transform(SEPTA_region_map, st_crs(dat_trip_count_hr))
```

```{r average bus hourly maxfreq, eval=FALSE, message=FALSE, warning=FALSE, cache=FALSE, include=FALSE}
# # Define color palette
# pal <- colorNumeric(palette = viridisLite::viridis(256, option = "C"), domain = dat_trip_count_hr$avg_trip_hr)

# Define the color codes for bus revolution and reverse them
bus_revolution_colors <- rev(c("#A6123A", "#655BA6", "#26A699", "#F2AE2E", "#F2E5D5"))

# Ensure max_freq is a factor with levels matching the colors
dat_trip_count_hr$max_freq <- factor(dat_trip_count_hr$max_freq, levels = c("5 MAX", "10 MAX", "15 MAX", "30 MAX", "60 MAX"))

# Define the color palette for max_freq
max_freq_colors <- c("5 MAX" = "#4B0082", "10 MAX" = "#FF4500", "15 MAX" = "#A6123A", "30 MAX" = "#26A699", "60 MAX" = "#F2AE2E")
pal_max_freq <- colorFactor(palette = max_freq_colors, domain = dat_trip_count_hr$max_freq)

# # Generate the list for overlayGroups with the desired order
# overlay_groups <- expand.grid(final_day_type = c("Weekday", "Weekend"), peak_category = peak_order) %>%
#   mutate(group_name = paste(final_day_type, peak_category, sep = "_")) %>%
#   pull(group_name)

# # Create the leaflet map
# m_avg_trip_hr <- leaflet() %>%
#   addProviderTiles(providers$CartoDB.Positron) %>%
#   addPolygons(data = SEPTA_region_map, color = "black", weight = 1, fill = FALSE) %>%
#   addCircleMarkers(data = dat_trip_count_hr %>% filter(final_day_type == "Weekday"),
#                    group = "Weekday",
#                    lng = ~stop_lon, lat = ~stop_lat,
#                    color = ~pal_max_freq(max_freq),
#                    radius = 1,
#                    opacity = 0.8,
#                    popup = ~paste0("<b>Stop ID:</b> ", stop_id, "<br>",
#                                    "<b>Stop Name:</b> ", stop_name, "<br>",
#                                    "<b>Avg Buses/Hr:</b> ", avg_trip_hr, "<br>",
#                                    "<b>Max Frequency:</b> ", max_freq, "<br>",
#                                    "<b>All Routes:</b> ", All_Routes, "<br>",
#                                    "<b>Day Type:</b> ", final_day_type)) %>%
#   addCircleMarkers(data = dat_trip_count_hr %>% filter(final_day_type == "Weekend"),
#                    group = "Weekend",
#                    lng = ~stop_lon, lat = ~stop_lat,
#                    color = ~pal_max_freq(max_freq),
#                    radius = 1,
#                    opacity = 0.8,
#                    popup = ~paste0("<b>Stop ID:</b> ", stop_id, "<br>",
#                                    "<b>Stop Name:</b> ", stop_name, "<br>",
#                                    "<b>Avg Buses/Hr:</b> ", avg_trip_hr, "<br>",
#                                    "<b>Max Frequency:</b> ", max_freq, "<br>",
#                                    "<b>All Routes:</b> ", All_Routes, "<br>",
#                                    "<b>Day Type:</b> ", final_day_type)) %>%
#   addLayersControl(
#     overlayGroups = c("Weekday", "Weekend"),
#     options = layersControlOptions(collapsed = FALSE)
#   ) %>%
#   hideGroup(c("Weekend")) %>%
#   addLegend(pal = pal_max_freq, values = dat_trip_count_hr$max_freq, title = "Max Frequency",
#             position = "bottomright")
# 
# m_avg_trip_hr
```

```{r average bus hourly simplified, message=FALSE, warning=FALSE, cache=FALSE}
# Define color palette for simplified frequency using the bus revolution colors
pal <- colorFactor(c("#A6123A", "#655BA6"), domain = c("15 minutes or less", "More than 15 minutes"))

# Create the leaflet map
m_avg_trip_hr <- leaflet() %>%
  addProviderTiles(providers$CartoDB.Positron) %>%
  addPolygons(data = SEPTA_region_map, color = "black", weight = 1, fill = FALSE) %>%
  addCircleMarkers(
    data = dat_trip_count_hr %>% filter(final_day_type == "Weekday" & !is.na(category)),
    group = ~category,  # Group data based on urban or suburban
    lng = ~stop_lon, 
    lat = ~stop_lat,
    color = ~pal(simplified_freq),  # Color based on simplified frequency
    radius = 1,
    opacity = 0.8,
    popup = ~paste0("<b>Stop ID:</b> ", stop_id, "<br>",
                    "<b>Stop Name:</b> ", stop_name, "<br>",
                    "<b>Avg Buses/Hr:</b> ", avg_trip_hr, "<br>",
                    "<b>Max Frequency:</b> ", simplified_freq, "<br>",
                    "<b>All Routes:</b> ", All_Routes)
  ) %>%
  addLayersControl(
    overlayGroups = c("urban", "suburban"),
    options = layersControlOptions(collapsed = FALSE)
  ) %>%
  addControl(html = "<div style='background-color: white; padding: 5px;'><strong>Source:</strong> SEPTA GTFS Data 2024</div>", position = "bottomleft")

# Display the map
m_avg_trip_hr

```

Midge’s daily challenge with inconsistent suburban bus service reflects broader transit inequities between urban and suburban settings. This disparity is exemplified by the operational inefficiencies of suburban routes, highlighted by data indicating these buses frequently fall within the lowest 15th percentile for service performance. This low percentile indicates prolonged wait times and unreliable schedules—factors contributing to higher operational costs and reduced reliability compared to more efficient urban counterparts.

<span style="color: #655BA6;">Suburban buses</span>, like the routes Midge uses, often suffer due to lower ridership densities and less optimized route planning, as reflected in the significant number of suburban routes like Routes 118, 129, and 90 underperforming in the 15th percentile. These routes' inefficiencies starkly contrast with the robust transit systems in urban areas, underscoring a critical gap in transit equity and efficiency that demands innovative solutions.

```{r scatterplot, message=FALSE, warning=FALSE, cache=FALSE}

route_evaluation <- read.csv("data/route_evaluation_report_2019.csv", sep = ";")

# Convert Cost.per.Passenger and Passengers.per.Revenue.Hour into proper numeric columns
route_evaluation$Cost.per.Passenger <- as.numeric(gsub("[$,]", "", route_evaluation$Cost.per.Passenger))
route_evaluation$Passengers.per.Revenue.Hour <- as.numeric(route_evaluation$Passengers.per.Revenue.Hour)

# Calculate 15th percentile for Passengers and 85th percentile for Cost (due to reversed axis)
passenger_15th <- quantile(route_evaluation$Passengers.per.Revenue.Hour, probs = 0.15, na.rm = TRUE)
cost_85th <- quantile(route_evaluation$Cost.per.Passenger, probs = 0.85, na.rm = TRUE)

# Define color palette for the types of buses
route_colors <- c("City Bus" = "#A6123A", "Suburban Bus" = "#655BA6")
pal <- colorFactor(route_colors, domain = unique(route_evaluation$Type))

# Create the ggplot
p <- ggplot(route_evaluation, aes(x = Passengers.per.Revenue.Hour, y = Cost.per.Passenger, color = Type, text = paste("Route:", Route, "<br>Passengers per Revenue Hour:", Passengers.per.Revenue.Hour, "<br>Cost per Passenger:", Cost.per.Passenger, "<br>Type:", Type))) +
  annotate("rect", xmin = 0, xmax = passenger_15th, ymin = max(route_evaluation$Cost.per.Passenger), ymax = cost_85th, fill = "lightcoral", alpha = 0.2) +
  geom_vline(xintercept = passenger_15th, linetype = "dashed", color = "grey") +
  geom_hline(yintercept = cost_85th, linetype = "dashed", color = "grey") +
  geom_point() +
  labs(title = "Performance of SEPTA City Bus and Suburban Bus",
       x = "Passengers per Revenue Hour",
       y = "Cost per Passenger",
       caption = "Source: SEPTA FY 2019 Report") +
  scale_color_manual(values = route_colors) +
  scale_y_reverse(breaks = seq(0, max(route_evaluation$Cost.per.Passenger, na.rm = TRUE), by = 1),
                  labels = dollar_format(prefix = "$", suffix = "", accuracy = 0.01)) +
  scale_x_continuous(breaks = seq(0, max(route_evaluation$Passengers.per.Revenue.Hour, na.rm = TRUE), by = 10)) +
  theme_minimal() +
  theme(legend.position = "none") +
  annotate("text", x = passenger_15th / 2, y = 6.5, label = "15th percentile", size = 3, color = "black", alpha = 0.5)

# Convert to plotly for an interactive plot with detailed hover information
p_plotly <- ggplotly(p, tooltip = "text")

# Display the interactive plot
p_plotly

```

Although microtransit has been proposed as an alternative to address these inefficiencies by offering more adaptable and potentially efficient services in lower-density areas, its effectiveness remains questionable. For example, AC Transit's experiment in Oakland to replace fixed routes with microtransit resulted in a cost per passenger that more than doubled, highlighting the challenges of implementing such services effectively.[^9]

This data illustrates that while microtransit is often touted as a modern solution for declining ridership, it generally does not perform as efficiently as traditional fixed-route services. The core efficiency of fixed-route transit comes from its ability to consolidate riders along a predictable path, minimizing deviations and ensuring consistent travel times. In contrast, microtransit's on-demand nature can lead to significant inefficiencies, with routes constantly changing to accommodate individual pickup requests, leading to increased travel times and operational costs.

Given these challenges, it might be more prudent to explore other alternatives, such as deploying smaller, fixed-route minibusses during peak hours to enhance connectivity and manage crowding more effectively. This approach could leverage the lower operational costs of smaller vehicles while maintaining the structured efficiency of fixed-route transit, potentially offering a more sustainable solution to the suburban transit woes that Midge and countless others face daily.

[^9]: [Urgo 2019](https://www.apta.com/wp-content/uploads/Conferences_Meetings_Events/Presentations/2018-Sustainability/Urgo-John.pdf#page=24)

```{r static scatterplot, eval=FALSE, include=FALSE}
# # Assuming passenger_15th and cost_85th are your calculated 15th percentile values
# ggplot(route_evaluation, aes(x = Passengers.per.Revenue.Hour, y = Cost.per.Passenger, color = Type)) +
#   # Add a rectangle and text annotation for the 15th percentile
#   annotate("rect", xmin = 0, xmax = passenger_15th, ymin = max(route_evaluation$Cost.per.Passenger), ymax = cost_85th,
#            fill = "red", alpha = 0.2) +
#   geom_point() +
#   labs(title = "Performance of SEPTA City Bus and Suburban Bus",
#        x = "Passengers per Revenue Hour",
#        y = "Cost per Passenger") +
#   scale_color_brewer(palette = "Set1") +
#   scale_y_reverse(breaks = seq(0, max(route_evaluation$Cost.per.Passenger, na.rm = TRUE), by = 1),
#                   labels = dollar_format(prefix = "$", suffix = "", accuracy = 0.01)) +
#   scale_x_continuous(breaks = seq(0, max(route_evaluation$Passengers.per.Revenue.Hour, na.rm = TRUE), by = 10)) +
#   theme_minimal() +
#   geom_vline(xintercept = passenger_15th, linetype = "dashed", color = "grey") +
#   geom_hline(yintercept = cost_85th, linetype = "dashed", color = "grey") +
#   theme(legend.position = "none", plot.title = element_text(hjust = 0.5)) +
#   annotate("text", x = passenger_15th / 2, y = 6.5, label = "15th percentile",
#            size = 3, color = "black")
```


### Impact and Innovation: Enhancing Suburban Bus Services

While the raw data illuminates the service disparities, the true depth of this issue is best captured through the lived experiences of suburban commuters like Midge. Personal stories reveal the broader human impact: missed opportunities, prolonged workdays, and diminished family life due to infrequent bus services. This qualitative insight paints a vivid picture of the daily challenges and underscores the urgency for targeted improvements.

SEPTA's ongoing "Bus Revolution" project aims to modernize and enhance bus services, promising updated routes and improved frequencies that haven't been revised since the 1960s. While the initiative seeks to reduce headways to a maximum of 30 minutes even in the most remote suburban areas, it has faced delays and criticism, particularly concerning its potential to disproportionately impact low-income neighborhoods. This tension highlights the complex interplay between improving service efficiency and addressing community concerns of gentrification.

As we conclude our exploration of suburban bus transit disparities in Philadelphia, the compelling narratives of daily commuters, underpinned by robust SEPTA data, paint a clear picture of the challenges and the urgent need for systemic improvements. The "Bus Revolution" project, while a promising step toward enhancing bus frequencies and updating decades-old routes, also highlights the delicate balance required to improve service efficiency without exacerbating social inequities.

Embracing innovative solutions and fostering community engagement in transit planning are crucial. By committing to both technological upgrades and equitable service distribution, SEPTA can ensure that every commuter, regardless of where they live, has reliable and timely access to public transit. This not only improves individual daily experiences but also supports broader goals of economic mobility and sustainable urban development.

As stakeholders, we are called upon to support these initiatives, advocate for transparent and inclusive planning processes, and ensure that the voices of all commuters, especially those most affected like Midge, are heard and addressed in the pursuit of a truly connected and equitable transit system.